{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import temperature data from the DWD and process it\n",
    "\n",
    "This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.\n",
    "\n",
    "To reduce computing time, we also crop all data earlier than 2007. \n",
    "\n",
    "Files should be executed in the following pipeline:\n",
    "* 1-dwd_konverter_download\n",
    "* 2-dwd_konverter_extract\n",
    "* 3-dwd_konverter_build_df\n",
    "* 4-dwd_konverter_final_processing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.) Download files from the DWD-API\n",
    "Here we download all relevant files from the DWS Server. The DWD Server is http-based, so we scrape the download page for all links that match 'stundenwerte_TU_.\\*_hist.zip' and download them to the folder 'download'. \n",
    "\n",
    "Link to the relevant DWD-page: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done\n"
     ]
    }
   ],
   "source": [
    "import requests\n",
    "import re\n",
    "from bs4 import BeautifulSoup\n",
    "from pathlib import Path\n",
    "\n",
    "# Set base values\n",
    "download_folder = Path.cwd() / 'download'\n",
    "base_url = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/'\n",
    "\n",
    "\n",
    "# Initiate Session and get the Index-Page\n",
    "with requests.Session() as s:\n",
    "    resp = s.get(base_url)\n",
    "\n",
    "# Parse the Index-Page for all relevant <a href> \n",
    "soup = BeautifulSoup(resp.content)\n",
    "links = soup.findAll(\"a\", href=re.compile(\"stundenwerte_TU_.*_hist.zip\"))\n",
    "\n",
    "# For testing, only download 10 files\n",
    "file_max = 10\n",
    "dl_count = 0\n",
    "\n",
    "#Download the .zip files to the download_folder\n",
    "for link in links:\n",
    "    zip_response = requests.get(base_url + link['href'], stream=True)\n",
    "    # Limit the downloads while testing\n",
    "    dl_count += 1\n",
    "    if dl_count > file_max:\n",
    "        break\n",
    "    with open(Path(download_folder) / link['href'], 'wb') as file:\n",
    "        for chunk in zip_response.iter_content(chunk_size=128):\n",
    "            file.write(chunk)  \n",
    "    \n",
    "print('Done')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
